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THE CARRYBACK PROCESSING ASSISTANT: 
A STUDY IN CASE PROCESSING 

by 

Richard K. Schreiber, Internal Revenue Service 
Ramesh Patil, Massachusetts Institute of Technology 



Abstract 

Case processing involves amending, 
correcting, or updating an existing 
account. It is the least automated type 
of processing in the Internal Revenue 
Service. This article describes how a 
constraint network and a process called 
"discrepancy resolution" can be used to 
limit the amount of input data needed 
to process a case. We present results 
achieved by the Carryback Processing 
Assistant, a system designed to partially 
automate the processing of the carryback 
claims received by an IRS Service Center. 



A Characterization of Case Processing 

We have analyzed the work of an IRS Service 
Center into the categories of case processing 
and input processing. A large workforce in the 
ten IRS processing centers is devoted to each 
type of task. Input processing centers around 
the entry of data into the computer system and 
is characterized by task specialization or 
horizontal integration. An example is the initial 
processing of tax returns. Case processing on 
the other hand involves amending, correcting, or 
updating an existing case or account. Making an 
adjustment to a tax account in response to a 
taxpayer request is an example of case 
processing. 

The difference between case processing and 
input processing lies in the amount of data to be 
entered, the complexity of the analysis to be 
performed, and whether entered data will be 
saved for future use. For case processing, the 
source document and mixed media documents 
such as electronic transcripts and returns from 



the file are assembled and the case is worked at 
a single point in time. For input processing each 
step is separated in time which results in 
additional steps to retrieve documents and make 
corrections. These differences may be clearly 
seen in Figure 1. 

Almost no case processing tasks have been 
automated. Normal automation of a case 
processing task would involve converting it into 
an input processing task. However, the 
relatively large amount of data needed to 
resolve a case and the level of expertise 
required to perform analysis of the case have 
proved in the past to be effective barriers 
against automation. 

The rapid growth of expert systems technology 
has changed this equation by providing the tools 
and techniques to capture the levels of 
expertise needed for case processing. Goal 
directed questioning can limit the amount of 
input data needed to resolve a case, but 
questions must greatly restrict the amount of 
data needed in order to overcome the inherent 
slowness of the Question & Answer technique, 
tn the domain of tax processing the expense of 
obtaining input data continues to loom as a 
formidable barrier. 

In the following sections we will present our 
solution to the problem of case processing. We 
will describe a program designed to assist in 
the processing of one type of case, tentative 
carrybacks. After examining the components of 
this system, we will describe the results it 
achieved in a field test, and conclude by 
describing why our techniques may apply to 
many case processing domains. 
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Case Processing 

Data enter input document. 

Subsequent Processing 
(Error Correction Loop) 

Format entered data Into 
transaction to post to file. 

Analyze input document 
& correct Inconsistencies. 

Retrieve document & 
correct inconsistencies. 



Input Processing 



Analyze input document 
& identify inconsistencies. 



Document Posting 

(Unpostable Loop) 

Analyze flic and correct 
inconsistencies with input 
document(s). 

Retrieve document & 
correct inconsistencies 
(Unpostables). 

Generate transaction to 
post to file. 



Analyze file and identify 
inconsistencies with input 
document(s). 



Post corrections to file. 



Figure 1. List of tasks associated with case processing input 
processing. In this model the tasks differ because data entry, 
analysis of the source document, and analysis of the file occur at 
different times for input processing but occur simultaneously 
for case processing. 



Design For a Case Processing Assistant 

The primary determinant of the shape of case 
processing is the lack of input data and the 
expense of acquiring it. A basic strategy is to 
enter a small subset of the data and use it to 
generate an approximation of the complete data 
set. This approach seems to be particularly 
powerful in the IRS forms domain. 

Once the minimum data set has been entered the 
technician must define goals to the assistant 
either explicitly or implicitly. These goals form 
the basis for testing the results generated by 
the assistant and should cause it to repair its 
own calculations using supplied target figures. 
In a more elaborate technique, the goals 
themselves can be called into question when the 
system is unable to find a causal basis for 
supporting them. 

Expert systems have proven that they can 
deliver expert analysis and can pursue an 



intelligent strategy for acquiring data. 
However, unless the system is provided with a 
complete set of input data it cannot be 
guaranteed to reach a satisfactory conclusion. 
This indeterminancy suggests that some explicit 
control or directions from the user will be 
necessary and for this reason we have looked for 
a partial solution in the form of an expert 
assistant. 

The assistant paradigm is well established and 
we will only briefly comment on it. An 
assistant provides an intelligent environment. 
It makes assumptions, performs certain types of 
operations automatically, provides a number of 
computational tools which may be used by the 
technician or at the direction of the technician. 
The assistant is capable of taking actions or 
proposing actions for review by the technician. 
While the assistant is not capable of 
independently processing a case, it works 
interactively with the technician toward that 
goal. 



219 



EXPERT 
SYSTEM 
TASKS 



LOGICAL 
CONSTRAINTS 



USER 
CONTROL 



FORMS 
INTERFACE 



DISCREPANCY 
RESOLUTION 



NUMERIC 
CONSTRAINTS 



FIGURE 2. The user interfaces with the assistant system either through the 
forms interface or through one of several expert system tasks. The underlying 
data is maintained in a numeric constraint network or its logical extension. 
The routines to resolve discrepancies between the assistant's computations 
and those of the taxpayer are called by the user through the forms interfaces. 



some related issues such as timeliness 
The Carryback Domain ano * completeness of the case. 



A carryback is an income tax claim (filed on an 
Amended Return, Form 1040X) or a tentative 
application (filed on Form 1045). A carryback is 
generated by negative income -- called a net 
operating loss -- or by tax credits that exceed 
income in a given year. The excess loss or 
credit is then "carried back" to a previous year. 
A loss is used to reduce adjusted gross income 
in the previous year, while a credit is applied 
directly against tax. The result in both cases is 
a tax refund. 

The procedures governing the computation and 
application of net operating loss are complex. 
So much so, that one study showed that forty 
-nine per cent of the carryback cases contain 
errors. It takes approximately one year of 
training and experience to produce a 
knowledgeable tax examiner. It takes another 
year of training and specialization to produce a 
knowledgeable carryback examiner. 

The basic processing scenario for working a 
case is as follows: 

Received cases are preprocessed to obtain 
printed account transcripts and to review 



Processable cases are assigned to a tax 
examiner who analyzes the case, verifies 
the computations, and determines whether 
additional case documents need to be 
retrieved from the file or whether 
processing on the case needs to be 
coordinated with any other units. 

When analysis of the case is completed, 
closing transactions are input to adjust 
the tax, modify the data base account and 
issue a refund to the taxpayer. 

System Components 

A user interacts with the Carryback Assistant 
and directs processing through the forms 
interface which sits on top of a logical and 
numeric constraint system. The constraint 
system generates the values for several of the 
forms. Discrepancy resolution or debugging can 
be initiated by the user or by the constraint 
system. An expert task system uses the logical 
constraint system, but interacts directly with 
the user. A schematic of the Carryback 
Processing Assistant is given in Figure 2. 



220 



2 3 
INPUT REVIEW 

FORM 1040/ NOL 

SCHEDULE A CALCULATION 

* 



4 




5 


REVIEW 




REVIEW 


CARRYBACK 




CLOSING 


CALCULATION 




TRANSACTIONS 



FIGURE 3. A case is established in Step 1 and provided with a minimum data set in Step 2. 
The assistant generates an approximately complete data set for use in Steps 3, 4, and 5. 1 f 
needed, discrepancy resolution is applied in Step 3 to arrive at the correct Net Operating 
Loss (NOL). The NOL is applied to previous tax years in Step 4, although the examiner may 
need to supply the assistant with additional data about those years. Correct closing 
transactions are generated in Step 5. There is a separate forms interface screen for each step. 




Forms Interface. The Carryback Assistant is 
implemented as a series of forms associated 
with the case. This . includes the Individual 
Income Tax Form, Form 1040 (pages one and 
two); the Schedule of Deductions, Schedule A; 
the Net Operating Loss Computation, Form 1045 
- Schedule A; the Application of Carryback 
Credits, Form 1045; and other supporting forms 
such as the Alternative Minimum Tax 
Computation, Form 6251. 

An interesting issue in designing screen forms 
is deciding the appropriate degree of forms 
mimicry. If we choose to exactly represent each 
form, a complete form cannot be displayed on a 
single screen. At every decision point where 
there is a conflict between exactly representing 
a form and making computations explicitly 
available to the user or providing a concise 
display, we have followed the most functional 
line. 

Our approach in designing forms has been to 
eliminate scrolling, to combine displays, to 
provide redundant displays, and to describe 
computations as fully as possible. We have 
eliminated scrolling by putting each form on a 
separate screen. This requires a more abstract 
form and detracts from the visual authenticity 
of the displayed form, but it accelerates 
processing and in practice we have found it to 



communicate all necessary information to the 
user. 

At the same time, we have added additional 
information to several forms in order to show 
explicitly how the assistant arrived at a 
particular total. While an explanation for a 
given computation may be requested, wherever 
possible we have tried to display all items 
entering into a computation. 

The user controls the system by moving from 
screen to screen. A common sequence is shown 
in Figure 3. Constraints are applied only when 
the local network for a particular screen is 
activated. Debugging is also initiated when the 
user identifies a discrepancy between the 
assistants and the taxpayers computations. 
The user is also provided with the ability to 
override any of the assistant's computations or 
conclusions. 

Numeric Constraints All variables are linked 
using a constraint system we developed for this 
project. The constraint system acts as an 
automatic programmer in converting a 
declarative description into rules which are 
fired whenever the value of a variable changes. 
The advantage of this constraint system is that 
building a network from a declarative 
description is easy and the network can easily 
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a.) (agi - ((income + (wages interesl))(adjustments + (keogh moving)))) 




b.) (return-signed and (signed-by-taxpayer (spouse-satisfied or 

((not-joinl-return if (= filing-status 3)) signed-by-spouse))) 

Figure 4. Two examples of how a network can be constructed from a 
declarative description. Numeric constraints are applied to a, while 
logical constraints are applied to b. Data is entered only at the leaf 
nodes. 



be changed. Figure 4 gives an example of a 
small network and the declarative description 
from which it is constructed. A disadvantage of 
constraints is that quiescence must be achieved 
after each input item is entered, and that in 
very large networks reaching quiescence may be 
time consuming. To avoid the latter problem we 
have partitioned our network into several 
smaller networks. - 

The constraint system may also serve other 
purposes. For example, we also use it to control 
the graphic redisplay of changing values and- the 
computation of complementary values. This is 
achieved by independently attaching rules or 
operators to nodes. We perform this when the 
network is initialized. 



Logical Constraints. We have extended the 
constraint system discussed above to process 
the logical operators "and/ "or," "not," "if," and 
"then". "If" is the main device for linking 
numerically constrained values with logically 
constrained values, while "then" links the 
logical network to the numeric network. A 
sample logical network is shown in Figure 4. 

The constraint logic is four-valued with the 
values of "True", "False", "Undetermined", and 
"Unknown". "Unknown" means that the user has 
been queried about the value of a particular 
variable, but has been unable to supply a value. 
"Undetermined" means that the user has not been 
queried. 
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In practice, "unknowns" usually occur when the 
taxpayer has not supplied one or more items of 
required data. Rather than rejecting the case, 
the examiner will process the case if the 
"required" data item would not change the net 
processing result. If a leaf node is set to 
"unknown", conjunctive clauses cause the value 
"unknown" to be propagated forward. Only 
disjunctive clauses offer the possibility of 
leading to a conclusion other than "unknown". 

Forward chaining is accomplished by the normal 
operation of the constraint system, while 
backward chaining is defined as a separate 
function. Backward chaining is initiated when 
the user wishes to activate an expert system 
task. The backward chainer searches for leaf 
nodes with the value of "undetermined" and 
triggers questions to the user. Although the 
backward chainer can be applied to any subgoal, 
can be exited and re-entered at any point in the 
processing, or can be terminated as soon as a 
conclusion other than "undetermined" is reached, 
it is generally useful in our domain to stop 
processing if the conclusion is "true" and to 
continue processing if the conclusion is "false" 
in order to identify all conditions that would 
have to be corrected to reach a "true" 
conclusion. 

Discrepancy Resolution. Reasoning about causal 
information, measuring it against some target 
value(s), and applying debugging procedures to 
the causal links is well described by Reid 
Simmons as the generate, test, and debug 
paradigm. In the debugging paradigm the 
primitive nodes in a causal net are located, 
discrepancies are identified by regressing 
though the nodes, and the discrepancies are 
repaired by applying one of six domain 
independent techniques. Multiple discrepancies 
man be resolved with hill climbing. 

Discrepancy resolution refers to resolving 
differences between values generated by the 
system and values supplied by the taxpayer. 
This process is close in spirit to generate, test, 
and debug; but computationally it is quite 
different. The major difference is that the goal 
or target figure is not necessarily correct. 
Either the system or the taxpayer may be 
correct, or both may be in error. Moreover, we 



do not have access to each item in the 
taxpayer's computation - normally, only a total 
value is provided. A partial solution is not 
helpful. If we cannot identify all of the causes 
for a discrepancy there is little reason to 
believe a partial solution should be preferred to 
the initial system assumptions. 

Discrepancies arise because the system has 
made an incorrect assumptions), or because it 
lacks input data. Correspondingly, the taxpayer 
may make errors in classifying income or 
deductions or by omitting totals that should be 
included in a computation. The taxpayer may 
also make computational errors. Occasionally, 
these may be true math errors, but they are 
more frequently caused by a misunderstanding 
of the various computation rules. 

To resolve a discrepancy we identify the 
primitive nodes that could have contributed to 
the discrepancy and then perform a closure on 
all possible combinations of values for these 
nodes. A closure yields fc £ (£ ) or 2n possibilities. 
Potentially, the number "of combinations to be 
considered can be large - the number of 
categories into which income from the Income 
Tax Return, Form 1040, can be analyzed is 28, so 
the number of combinations would exceed 200 
million. In fact the average number of 
categories is more on the order of 28, and 
simple filters to eliminate impossible values 
routinely reduces the number to 25. > 

Current Results 

In February and March of 1988 we tested the 
Carryback Processing Assistant at an IRS 
Service Center. Two hundred cases that had 
been previously processed and reviewed were 
selected for the test. Cases were screened to 
obtain an equal distribution over seven 
attributes we considered important to 
distinguish case types. 

The Assistant processed seventy-five percent 
of the cases correctly. Fifty percent of the 
cases required discrepancy resolution, but the 
system was only able to resolve fifty percent of 
those discrepancies. Approximately two-thirds 
of the unresolved discrepancies could be 
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resolved by making some additional 
assumptions. But resolving the remaining 
discrepancies has caused us to design major 
revisions to the discrepancy resolution routines. 

The most important technique in resolving 
discrepancies is the inductive closure we 
described above. Currently, if we are trying to 
resolve a nonbusiness income discrepancy, we 
compute a closure on the value of all primitive 
nodes above the node we are trying to resolve. 
Unfortunately, this computation does not 
consider any nodes that may have misanalyzed 
into another category (business). Nor is the 
analysis of other categories sufficient, since 
multiple misclassifications could have occurred. 
Multiple misclassifications can be accounted for 
by considering all types of income together and 
then testing to find permissible classifications. 

A related step in the resolution process is to 
acquire additional information. We initially 
developed the logical constraint system and 
backward chainer in order to obtain additional 
information to aid in discrepancy resolution. 
Additional experience indicates this may not be 
necessary. We will either ask for specific data 
items or provide an interface that solicits the 
needed data. 

In some cases, all efforts at resolution fail. If 
all pertinent data has been entered, it is likely 
that the goal or target amount is in error (the 
taxpayer's figure). The assistant can 
tentatively conclude that the taxpayer is in 
error, but it is more useful to present a display 
showing the assistant's computations and 
possible missing data items. This display 
serves as an index to the tax examiner in 
suggesting other possible resolutions. The 
examiner must decide whether to accept the 
assistant's figures or to override them. 

An interesting result to us was the systems 
usefulness in detecting errors. We selected 
processed and reviewed cases so that we could 
be certain about the results and could identify 
errors made by the Carryback Processing 
Assistant. A surprising result, however, was 
that the system was able to identify or suggest 
errors made by tax examiners in 20 percent of 
the cases processed! This was a major success 



for the assistant approach and established a 
solid basis for further development. 

The user's ability to override the assistant's 
computations is a mixed blessing. We found 
several cases where the examiner incorrectly 
overrode the assistant's computations. We plan 
to permit the override, but incorporate 
additional displays to justify the assistant's 
computations and to solicit justification from 
the examiner. 

We were unable to obtain usable production 
results because connection of the assistant to 
the existing data retrieval system could only be 
simulated. We expect a significant decrease in 
the time required to process a case once we can 
take advantage of a direct connection to the 
data base. 



Conclusion 

There are significant barriers to applying expert 
systems technology to the domain of case 
processing. Although ES systems are well 
equipped to deliver the kind of expertise that is 
needed, they founder because of the need for 
input data (just as in the case of traditional 
systems) and because of the difficulty of 
establishing clear goals or targets. 

We have successfully overcome both of these 
difficulties by using constraints to generate an 
approximately complete data set from a partial 
data set and by developing the technique of 
discrepancy resolution which allows debugging 
in the absence of a clear target. Although 
discrepancy resolution is potentially as capable 
of reaching a conclusion as a human expert, it is 
used primarily because it has access to less 
data than the human expert. Thus, it is most 
effectively implemented in the context of an 
interactive assistant. 

These techniques can be widely applied to case 
processing problems in the tax processing 
domain. They may be applied to other case 
processing domains where the input consists of 
multiple interrelated or hierarchical forms, 
namely where data on one form is used to 
support computations on another form. These 



224 



techniques may also be applied in domains with 
relatively few variable values. In - that case, 
default values can be propagated through the 
network, and discrepancy resolution can be used 
to identify the variable values, or to test 
different default values. 
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